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PREFACE 


The work described in this report was performed by the Astrionics 
Division of the Jet Propulsion Laboratory. 

The research reported in this Technical Memorandum is a disserta- 
tion presented to and accepted by the Faculty of the Graduate School, 
University of Southern California, in partial fulfillment of the require- 
ments for the Degree Doctor of Philosophy (Electrical Engineering). 

The examples in this report pertain to pattern recognition of char- 
acters. However, the theory of multiclass sequential hypothesis test can 
be applied in other disciplines. The theory is useful in signal detection as 
well as in detection of objects by a robot, for instance. 
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ABSTRACT 


In recent years there has been a sharp rise in the 
need and interest for pattern recognition. In particular 
much work has been done on the problems of; machine read- 
ing. There are algorithms which partially solve the 
problem of reading, impact printed material. This dis- 
sertation presents an algorithm which can be used to 
build a reading machine that will read impact printed 
characters and handwritten letters. 

Invariant features. are extracted by random lines. 

The number of’ intersections and also the total length 
of Intersection that - these lines. produce are the random 
variable observations used as inputs to a hypothesis 
test.' This method allows the pattern to.be anywhere in 
the retinal It eliminates the cost of fine. alignment of 
the pattern before -taking samples. - -Many ■ previous users 
of these features utilized only the mean of the random 
variable. Here the whole probability distribution of 
the random variable is used. This allows the intro- 
duction of size Invariant methods. 

The sequential multj class hypothesis test presented 
in this dissertation is in such a form as to allow rapid 
computation of the errors of the first and second kinds 
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for each possible decision. .This is useful because in 
any practical system the user desires to have easy access 
to the parameters which control the performance of the 
machine. 

One interpretation of this test is that it computes 
the ratio of the likelihood of an observation coming from 
a class to the likelihood of an observation coming from 
any other class. When this ratio exceeds a threshold 
a decision in favor of the class is made. For each sam- 
ple, there are as many such comparisons as there are 
classes . 

The sequential multiclass hypothesis test proposed 
in this dissertation is a Bayes test at each step. .The 
proposed test is Wald's sequential probability ratio test 
for the two-class problem. It is not like the general- 
ized Wald's test v/hich tests all combinations of two 
hypotheses, nor is it like the M-hypothesis test which 
also tests the same number of combinations-. The number 
of comparisons these tests make is (1/2)M(M-1), where M 
is the number of classes. They require far more com- 
putations than the proposed test . 

Extensive experiments with block letters and hand- 
written numerals are reported. These experiments verify 
the usefulness of the proposed multiclass hypothesis test. 
These experiments show that the error rates are under the 
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control of the user and. that the average length of the 
test can be predicted. 

A survey of the methods In pattern recognition is 
presented to put the author’s contribution in perspective. 
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CHAPTER I 


INTRODUCTION 


A. THE PROBLEM - 

There are two aspects to pattern recognition. In 
one form of the problem a field of t data is given to a 
recognition machine and it is asked to state whether 
there are patterns. In the second form, the algorithm 
is required to decide under certain criteria which of 
the known patterns the data represent. (Often the null 
pattern or the reject option is included as a possible 
decision.) In this work the emphasis shall be on the 
solution of the second problem. 

Pattern recognition is a two step process. First, 
observations - are made, then an algorithm uses these 
observations to arrive at a conclusion. Observations 
include all forms of measurement, filtering, and digi- 
tising. The decision algorithms may be linear, non- 
linear, or statistical functions of the observations. 
There are abundant examples of such processes in nature. 
One first he ars sounds of speech, then understands their 
properties. One must see the printed page before one 
can read the words . 
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It is not the object here to study how these pro- 
cesses operate in nature. But these examples clearly 
point out the important interrelationship between the 
pbservation and the algorithm. Hence to obtain a good 
pattern recognition machine, it is required that the 
observation and the decision algorithm be studied to- 
gether . 

Often investigators in pattern recognition have 
taken the' observation phase of the process in an ex- 
pedient manner. -By arbitrarily limiting the type of 
.observation, one severely narrows the possible class of 
compatible or feasible' decision algorithms. As an ex- 
ample consider the early investigators who used- the time 
signal from a television-like scanner. The two di- 
mensional region- of interest is. divided in a checker- 
board manner~and each square is assigned -a gray level 
according to the image. The- choice of such a set of 
n by m samples as the observation features is unfortu- 
nate. Computational requirements on the large set of 
numbers • limit the types of algorithms. 

The requirements of the problem often suggest a 
class of decision algorithms. Then one must knov; how 
to choose the best features: For instance, when the 

requirements of the problem are stated in terms of 
minimizing the average risk or in terms of the probability 
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of the recognition, certain- statistical tests come to 
mind'. What remains is to find the feature extraction 
scheme which will meet the needs of the statistical 
methods consistent with computational and other require- 
ments . 

An example of a practical problem in, pattern recog- 
nition with some of its requirements would be the design 
of a machine which could read the address of a letter 
and sort it according to the postman ’ s . route , with as- 
signable probability of the correctness of the sort. 

It would reject as few as possible and sort at the high- 
est possible speed. So far the only "machine " that 
comes close to meeting these requirements is man. , 

Exactly what features man extracts from. the address 
label is not- known nor is it known what algorithm he 
uses to read written material. The motivation behind 
developing a machine which will perform reading is that 
the machine may be faster for. a subset ,of "easy" prob- 
lems. It seems that the speed of the algorithm, can be 
enhanced if the algorithm is b.ased on some .random , sam- 
pling of the data rather than on some fixed extraction 
such as contour tracing which takes more effort. 

The pattern recognition system presented, in this 
thesis will use a statistical hypothesis test,.. The- 
method used in the observation phase Is carefully chosen 
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to assure the stochastic nature of the input. 


B. THE DISSERTATION 

This dissertation explores a solution to the pattern 
recognition problem which attains a requested performance 
level and which optimizes speed (amount of computation) 
and storage requirements. One may observe that a proba- 
bilistic decision machine is most natural to the require- 
ments of certain problems. Then suitable features are 
chosen as the input to the algorithm. 

This dissertation relies heavily on the problems 
of character recognition for examples and illustrations. 
Let it be noted that the ideas of randomized feature 
extraction may be used for other types of problems. 

For instance they may be used for feature extraction 
of phonem es i n audio signals. 

Chapter II contains a survey of pattern recognition. 
A few of the important tools used In pattern recognition 
are presented to put this dissertation in perspective. 

The works of certain Investigators are discussed so that 
the two steps in pattern recognition can be illustrated. 
Multiclass hypothesis testing is discussed in Chapter II. 
Maximum likelihood and Bayes procedures are reviewed. 

In general it is difficult to compute the signi- 
ficance of a test. That is. It is difficult to compute 

i 

how many samples are needed for a level of performance 
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because n-fold integrals are involved. Chapter III- 
describes a method of approximating the significance of 
a test when the test is of a special form. Since the 
-significance of the test can be monitored easily for 
each sample as it is observed, a sequential multiclas 
hypothesis test results. 

The requirements of the problem demand a stochastic 
decision algorithm. Line intersection length and the 
number of intersections of a random line with the figure 
are presented as two invariant feature extraction tech- 
niques, The properties of these features relevant to 
font, size, and noise are discussed In Chapter IV. 

Chapter V presents experimental results using the 
features of Chapter IV and the algorithms of Chapter III. 
Chapter V also includes the results of a recognition 
experiment of hand printed digits. 

C. NOTATIONS AND DEFINITIONS 

The notations and definitions used in this disser- 
tation are consistent throughout. A glossary is included 
at the beginning of this dissertation. 

In the problems considered in this work it is assumed 
that there are M « 2,3,-*. hypotheses. Only one hypothe- 
sis is actually true. The ith hypothesis, denoted 
shall be the proposition that the observations v. = 

( v i,V2 , . . . ,v') are taken from the' ith class of 
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distribution F^. The symbol above the function name (~) 
allows the same name Fj_ to be given to a family of 
functions associated with a hypothesis. Strictly, the 
functions F^v^, . . . ,vjq) and Fj_( vp, . , . ^ 55 ) are not the 
same thing. F_^ will be used to denote the distribution- 
function of one variable Fp(v/j). 

It is assumed that the distributions Fj_ are distinct. 
That is, Fj^Fj if i^j . If the densities exist, then dF^ 
is the probability density function. 

The a priori probability that is true is 

P ± * ProbO^ is really true) ( 1 . 1 ) 

Clear ly 

P 1 dF i (v) = Prob{H 1 is true and (1.2) 

v = (vp, v 2 , . • • , v n ) is observed} 

Or 

dF i (v) = Prob{v = ( vn ,.v 2 , . . • > v n ) is observed 

given is true} ( 1 . 3 ) 

The algorithms considered here will be allowed to 
.verify one of the hypotheses or none at all. This last 
decision is often called a reject. 

Dq = reject (1.4) 

- accept Kj_ i - 1 , 2 ,...,M (1.5) 
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The ratio of two probability densities will be named 


Z n M.i') = (v l’ v 2> * * • > v n) 

dP i ( v 1> v 2> . . . »v n ) 

(1.6) 

When the samples are independent. 


n dP . ( v ) 

z n (i,j) - n 

m_1 dPiCva) 

(1.7) 

* H z m (i,J) 

m=l 

(1.8) 

where 


Zm Ci,j) - dP J (v -> 
dFi(v m ) 


Often the logarithm of Ratios 1.6 and 1 

.9 are useful. 

Z n (i,j:) - In Z n (i, j ) 

(1.10) 

and 


2 m ( i j J ) = in z m ( i , j ) 

(1.11) 

Because the logarithm -of a product is a sum 

the logarithm. 

n 

• 2 n( i j <} ■) = -I z iri( i j J ) ■ 

(1.12) 


m=l 
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Any statistical decision algorithm is subject to 
errors. The probabilities of the errors are given the 


names e 


ij * 


= Prob {accepting when Hj is true} • 

= Prob{Dj_ j Hj true} (1.13) 

More precisely, let be the region in v = (v-j_, v 2 , . . . ) 

such that a decision D-^ is made at the nth stage. 

{v e }=Mdecision is made exactly when 

n components of v are observed} (1.1m) 

Also let e^-^ be the probabilities of error for’ de- 
cisions made with n samples. Then 


e ij (n) “ I ( n) d ^ Cv l’ v 2’-**> v n> <1-15) 

The superscript is used to stress that there are n com- 
ponents in the vector v. This is necessary to compute 
the error probabilities for the sequential tests. 

Let p(n) be the probability that the test ends at 
the nth stage. 

p(n) - Prob {sequential test ends 

at the nth stage} (1.16) 


8 


JPL Technical Memorandum 33-48Z 



The total error rates are 


• e ij " E e ij (n) (1-17) 

= I / , cLPj (v) (1.18) 

where the last equality is by Definition 1.15. 

A table of [e.^] is called a confusion matrix. The 
probability of correctly accepting the ith- hypothesis 
given that is true is 

e il = Prob{DjJ% true} (1.19) 

Of course this is not an error, but the letter "e" is 
used for consistency with the other entries of this 
table. The probability that a decision algorithm will 
correctly choose a hypothesis is 


p i e ii = Prob{D i and (1.20) 

This term appears frequently in subsequent chapters. It 
will be called the probability of detection and given the 
notation 

Yi = Pi e n (1.21) 


Two types of errors are of particular importance in 
pattern recognition. The first is the probability that 

I 

the result of a classification Is incorrect. The second ; 

• i 
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is the probability that, given a known pattern, the algo- 
rithm will not correctly detect it. 

An example of an application of a pattern recog- 
nition algorithm will clarify this point. Suppose a 
reading machine is scanning a typewritten page. If it 
reports that the next letter Is "Q" it is desirable to 
know the probability that such a report Is incorrect., 
i.e., the machine is really observing another letter. 

The probability of such an event is called' error proba- 
bility of the first kind, oiq. On the' other hand, the 
reading machine may be positioned over a known letter, 
"B”. The probability that the machine will correctly 
identify a letter Is the probability of detection, y B . 

If there is, a .misclassif ication then there has been an 
error of the second type. Its probability is 0 B and 

53 p B - YB (1.22) 

An algorithm may classify a' given test pattern into 
an incorrect class. 

a^- = Prob{D^ is incorrect} 

= I (1.23) 

This is the probability of false declaration. 


TO 
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The {a-^} are defined for all i - 0,1, -2, ... ,M. -There 
are M classes and i - 0 corresponds to the null pattern 
or the reject option. That is ccq is the probability of 
>a reject occuring. 

Another type of error is the probability of -a miss ♦ 
That is, there is the probability that Hj_ is true and 
an incorrect decision D j , where j^i, is made. 

& i = p i l e ij (1..2^) 

The (3^} are meaningfully defined for all i = 1,2,...,M 
but 3 0 = 0. 

It seems reasonable to characterize a pattern recog- 
nition system in terms "of {a^} and {y^} . It is useful 
to be able to find an algorithm at a specified level of 
{a^} and (y jl . 
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CHAPTER II 


SURVEY 


A survey of the mathematical tools often used in 
pattern recognition and a few experiments in pattern 
recognition are presented in this chapter. The purpose 
is to put this dissertation in perspective. For other 
examples in this field the reader is directed to Nagy 
[Ref. 1] and to Pattern Recognition [Ref. 2]. 

A . MATHEMATICAL TOOLS 

1. Linguistic Approach 

In the linguistic approach, the Input features are 
the strokes and the stroke locations. Without becoming 
too involved, an example will be given. 



c 


d 


Fig. 2-1. Basic Elements of a Two Di- 
mensional Field 
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The set of all basic elements is called the alphabet. 
Suppose the alphabet is as displayed in Figure 2-1. A 


few of the possible inputs are: 

a) c d c which represents H (2.1) 

b) a d b which represents A (2.2) 

c) c b a c which represents M (2.3) 

d) c b c which represents N (2.4) 


The action of the decision algorithm is much like 
a compiler. It checks to see if the combination of the 
input elements forms a pattern in a dictionary. To 
perform this chore efficiently one uses -all the mathe- 
matics of context free language, graph theory, and 
compiler theory [Ref. 3 and 4], 

2. Linear Operations 

Many mej;_hcds look upon the input x as a matrix or 
a vector. Nilsson [Ref. 5, p. 79] discusses partitioning 
of the observation space into classes. Andrews [Ref. 6] 
on the other hand uses transform methods on the input. 

The input is sometimes looked upon as -a matrix 
X = [Xjj ] and transformations upon X are performed. 

Z * P X Q (2.5) 

Functions of Z are used in the decision algorithm. 

.Andrews [Ref. 6] uses cross-correlation of a letter 
prototype against a field of letters to find the matching 
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letters. To optimize on speed of computation the, fourier 
transform of [Xy] is used. The transform of the input 
and the transform of the .reference are multiplied to- 
gether, thus giving a matched filter operation. This 
method requires huge amounts of computation and is sen- 
sitive to rotation as well as to scale variations. 

Often a class is defined by several prototypes 
The superscript (i) says that the 
prototype belongs to class i. A prototype is described 
by a vector of measurements or features 

Gl (l) - (g 1:L (i) C 2 ’ 6 ) 

Therefore the prototypes are points in the n-space of 
features . 

One method of recognizing an observed sample 
X = J x n ) is to classify it into the class of 

the "closest" prototype. 

Many functions have’ been used to measure the close- 
ness of two points in the n-space. The Euclidean 
distance 

d 2 (GyX) = (Gj-X) * (Gj-X) (2.7) 

sample . 

The shortcomings of such a method are three-fold. 
First when there are many prototypes a large number of 
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computations are required. Also the error rates are 
difficult to predict or control. Furthermore d 2 (Gj,X) 
depends upon the units chosen for the individual features : 
X - (5 dollars, 4 inches, 6 volts) versus X = (500 cents, 
10 cm, 6000 mv) . 

One approach often used to ' , normalize , ' the n-space 
of features is to find weight vectors 


w 


= (w 1 (1) ,w 2 (l) ,...,w n ( i )) 


( 2 . 8 ) 


constrained by the product 

5 w<i> = 1 

j-1 J 


(2.9) 


or by the sum 


n 


1 Wl n> = i 

4-1 J 


( 2 . 10 ) 


so that the intra-class distances d 2 ^ ) are 

minimized and- the inter-class distances d 2 (G^ 1 ) ,G^ j ^ ) 
are maximized. The reason for doing this is that the 
prototypes of one- class ought to be "close" whereas 
prototypes from different classes ought to be "distant". 

Some investigators have attempted to measure the 
distance between classes [Ref. 73- One distance is 
called the divergence and another the Bhatlacharyya . 

They are defined, respectively, for the two-class 
problem as 
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( 2 . 11 ) 



dP-j^x) 



d?! (x) 

J(H,,H ? ) - l 
H x 

ln dP 2 (x) 

- t 
h 2 

ln 

dPgCx) 


And 


B(H 1 ,H 2 ) 


-In 


oo 

/ [dF 1 (x)dF 2 (x)] 1 / 2 dx 

_oo 


( 2 . 12 ) 


where the probability distribution of a sample X over 
the ith hypothesis is Fi(X). These distances are not 
metric since the triangular inequality does not hold. 

A tremendous amount of computation is involved in 
the determination of w, the weight vectors. And yet, 
such a method still leaves open the question of pre- 
dicting the performance of the classifier in terms of 
error probabilities. 

Nilsson uses hyperplanes to separate the classes 
in the n-space of measurements. A linear discriminant 
function for the ith class is formed by taking a dot 
product of the input and a weight vector for each class. 
This gives the discriminant function 

d ± = X • w('i) (2.13) 

where the weight vector for the ith class is 

w(A) = (v/^(^) jWg^) , . . . ,w n (3 ) ) (2.14) 

The i for which is the largest is chosen as the class. 
,A recursive method of choosing so that linearly , 
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separable patterns can be partitioned is given in 
Nilsson [Ref. 5> p. 7S], under trainable linear classi- 
fiers. A theorem given by Nilsson assures a partition 
pf the training set if the patterns are linearly sepa- 
rable . 

The drawback to this approach is that often classes 
are not linearly separable. Also error rates are ex- 
tremely difficult to compute. 

3. Potential Functions 

Another approach to the assignment of points of a 
finite-dimensional vector space to one of a family of 
classes on the basis of prototypes of those classes is 
called the method of potential functions [Ref. 8], This 
method reduces to the construction of functions q-^CX), 
one for each class, so that, if 

qj(x) > qi(X)- for all (2.15) 

then X = (x^ ,X 2 j . . . s x n ) is classified as a member of 
class j , and where these functions are constructed as 
superpositions of potential functions f(X,G) 

m . - 

qp » 1 l f(X,G/ 1 )) (2.16) 

m j=l J 

The sum is over the prototypes of class i. 

A reasonable set of restrictions on the potential 
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functions can be phrased in intuitive terms as : 

a) f (X, Y) should be maximum for X = Y. 

b) f(X,Y) should go to zero for X "distant" from Y. 

c) f (X, Y) should be smooth for easy analytic mani- 
pulations and decrease monotonically with the 
".distance" between X and Y. 

d) f(X,G^)) - f(X,G(j)) should imply that X is 
equally "similar" to the prototypes of class 
i and j » 

A function often used for f(X,Y) is 


f(X,Y) 


1 

1 + Ad 2 (X,Y) 


(2.17) 


where A is some constant and d 2 (X,Y) is some distance 
function. A form also used for the potential function is 

~Hx-y| l 2 

f(X,Y) = A exp 2a 2 (2.18) 


- <• , o 

where A and a are constants and j | X— Y [ | ' is the norm 
square of the difference vector. 

I 

Clearly this function determines the way the space 
is partitioned. For instance if a approaches 0, only 
the prototypes will be defined to belong to the classes, 
whereas when a approaches increasing portions of the 
feature space will be defined. 
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It Is not clear how the error probabilities are 
computed. Also the amount of computation one must carry 
out for each classification is very large. 

4. Statistical Methods 

The observations are often looked upon as random 
variables. This allows statistical methods to be ap- 
plied to the classification problem. Usually some form 
of a Bayes test is used, as in the maximum likelihood 
classification technique. Such tests are optimum with 
respect to certain loss functions. However, some authors 
modify well known methods for computational or experi- 
mental expediency. In so doing they lose the optimality 
of the test, Reed f s work described below being an example. 

• Both fixed length sample tests and' sequential tech- 
niques have been used in pattern recognition. In this 
section many methods are discussed In detail with comments 
as -to the special needs of each technique. Where appro- 
priate, comments are made as to the inadequacy of the 
method. 

a. Definition of Bayes Decision Rule 

Bayes rule minimizes the average cost of making 
decisions [Ref. 9, p. 24], The’ average cost r is com- 
puted as 
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r 


(2.19) 


M r M 

= / l «(Dilv> [ l L dPj(v)p.] dv 

r i=o L j=i 0 

where the decision rules are 

6(DjJv) = Prob (deciding when observing v) (2.20) 

and v e F. Clearly this integral is minimized if 
6(Di|v) = 1 for the i which gives, the minimum 

M Pm 

J LijdPjCvJPj < mir, 2 L ij d? j (v)Pj (2.21) 

and 

6(D k jv) - 0 for k ? i (2.22) 

This formulation is general enough to include many 
useful tes ts . The difficulty arises in choosing mean- 
ingful values for the loss functions. In pattern recog- 
nition applications a further difficulty is due to the 
complexity of computing the error rates for various loss 
functions. In Chapter III a method of choosing one 
meaningful form of the loss function that allows easy 
estimates of the errors is given. 

b. Maximum Likelihood Decision 

A special form of the Bayes test is the maximum 
likelihood decision rule. The criterion for decision^ 
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is that the probability of the observation coming from 
the class be maximized. This is not necessarily the 
best thing to do, as an example will illustrate. How- 
ever, under this criterion one chooses D-j_ such that 


PidPi(x) 


I PjdPj(x) 
j-1 J 


max PidFi(x) 


I 

3*1 J J 


(2.23) 


Clearly the denominator is constant for a given x. The 
rule is equivalent to choosing Dj^ such that 

PidF^Cx) = max{PjdFj (x)} (2.24) 

J 

It can be shown that this is the Bayes rule with 
LjLj=l, i^j . This rule classifies the observation without 
regard to the type of error .that it is making. Con- 
sequently there is little control over the operating 
characteristic of the algorithm. 

As an example consider three hypotheses H 2 and 
H 3 with dF^, dF 2 and dF^ as illustrated in Figure 2-2. 


SF3. • dF 2 



f \ 


dF 3 

1 : 




-1 0 1 
Fig. 2-2. Three Probability Densities 
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It is obvious that D 3 will never be chosen with this 
algorithm, even when x has come from H 3 . 


c. Tradeoff of Recognition Probability against 
Reject Rate 

This section studies a method proposed by Chow 
[Ref. 10 ]. It concludes that both the definition of 
optimum and the proposed optimum rule are deficient. 

The total error probability and the reject rate are often 
used to characterise the performance of a pattern recog- 
nition system. Chow describes a classification and 
rejection rule based on these parameters. 

Chow [Ref. 10 and 11] modifies the maximum likelihood 
classification technique. Optimum here means that a rule 
minimizes the reject probability for a given level of 
total misclassification . The rule rejects the pattern 
if the maximum of the likelihood function is les.s than 
a threshold. The rule is defined as 


6 (D ± } x) 


1 if 1 P i dP 1 (x) > PjdFj-(x) for all j=l,2,...,] 
and 


= < 


n 


P i dF 1 (x) > (l-t)^^P i dF i (x) 


0 otherwise 


(2.25) 


i 


-t 1. 


r J • 


Inhere is ambiguity when 
When this happens a decision 


PjdFi(x) PjdF-i(x) , 
can be made randomly . 
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6(D 0 |x) = ] 


1 if fi(D ± jx) = 0 for all i=l,2,...,k 
0 if 6(Dj_Jx) = 1 for any i=l,2,.,.,k 


( 2 . 26 ) 


The parameters t e [0,1] and controls the reject region. 
The error rate, reject rate and the probability of 
correct recognition are defined, respectively, as 


k k 

E(t) = / l l 6(D4 jxjP-tdlMx) (2.27) 

R n i=l j=l J 


R(t) = / 6 (Do | x) l P i dF i (x) 

R« i-1 


C(t) = 1 - E(t) - R(t) 


Now two useful functions can be defined as 

max [p i dF i (x)1 

m(x) - _i__ 

dP(x) 

and 

k 

dP(x) » l P i dF 1 (x) 
i=l 


( 2 . 28 ) 


(2.29) 


(2.30) 


(2.31) 


The decision rule can be restated in terms of these 
functions : 

1) accept a pattern whenever m(x) > 1-t (2.32) 

2) reject the pattern whenever m(x) < 1-t (2.33) 

The region of acceptance rpj-. and the region of rejection 
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can be defined in terms of these definitions: 


rp A = {x: m(x) j> 1-t) (2.34) 

- lx: m(x) < 1-t} (2.35) 

Clearly the rejection and acceptance probabilities are 

R(t) = / dP(x) (2.36) 

’J'R ■ 

A(t) = / dP(x) (2.37) 


A few simple properties of the rejection threshold t are: 

1) Both the error' and reject rates are monotonic in 

one decreasing and the other increasing. 

2) The reject threshold t is an upper bound on the 
error rate E(t). Let x be in ip A , the region 
where S(DqJx-) = 0. That is, m(x) (1-t) and 

E(t) = A(t) - C(t) = / [l-m(x) ]dP(x) 

*A • 

< tj. dP(x) < t A ( t ) < t (2.38) 

*A 


3) 


The reject threshold t is a differential ratio 
of error rate and reject rate when R(t) can be 
differentiated . 


d£ 

dR 


-t 


(2.39) 
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4) The probability of acceptance A(t) has the 

property 0 A(t) _< 1 for t e [0,1]. A(0) _> 0, 

A(l) = 1. 

There is a question concerning optimality. This 
method says, in effect, to raise a threshold (1-t) from 
zero to an appropriate value so that the- total proba- 
bility of error. 


k k 

E(t) = / l l (DjxJPidFiCx) 

i=l J 


k ' 

= 1 

i=l 


k 

l 

jVi 


e ij P J 


(2.40) 


equals a design criterion. One question that is un- 
answered is whether there are many decision rules that 
will give the same "optimality". This question arises 
since the total error rate in the definition relies only 
on the total of the probabilities. There' are many sets 
{e-y} which give the same sum. Clearly not all such 
tests are optimal, from the user's point of .view. This 
may mean that wrong costs were chosen. Here is an ex- 
ample that will po-int out the weakness of the above 
method. Consider the three hypotheses as illustrated 
in Figure 2-3. Assume that each class is equally likely. 
According to the definition 3dP = dF-^ + dFg + dF^-. This 
is Equation 2.31, illustrated by Figure 2-4. 
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dF]_ dFg 





F- ~ — 1 



dF 2 



— 

1 


0 


Fig. 2-3. Example for Chow's Method 


dP 


-1 1 
Fig. 2-b . Illustration of Eq . 2.31 

Since max P-idF^(x) = 1, for -1 < x < 1, then m(x) = 1 

i dP(x) 


(1-t) - “ 

' 

— 

— 

— 



l/dP(x) 




1 



1 


Fig. 2-5. Chow’s Rejection Region 

Figure 2-5 illustrates m(x), and (1-t) corresponds to 
some value of the total- probability of error. The reject 
region is the interval in for which m(x) < (1-t). 

The rej-ect region Is the whole of the region over which 
class 2 is defined. From the Bayes point of view this is 
a result of minimizing the average cost with respect to 
some cost. In particular it is like assigning a cost of 
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zero for miselassifying class 2. The unfortunate con- 
clusion is that class 2 will never be detected. 

Thus there are two flaws to Chow's method. First 
"optimum" is ambiguously defined in terms of total proba- 
bilities. Second the "optimum" rule may never detect 
certain classes, even when requested to do so. Conse- 
quently it is important to ask whether it is possible. to 
devise a test which is defined in terms of the natural 
parameters of the pattern recognition problem, proba- 
bility of false declarations {d^} and probability of 
detection {y^}. Are' there tests that give results which 
approximate the design criteria? In the next chapter 
this question is answered in precise terms. 

d. Wald's Sequential Probability Ratio Test 

Many applications use Wald ' s sequential ’proba- 
bility ratio test. This test assumes the samples are 
from one of two classes. Samples from class i have a 
distribution Fj_ and samples from class k have a dis- 
tribution F k . 

Samples are taken one after the other. It is not 
necessary to assume that they are independent samples. 
However, such an assumption clearly reduces the require- 
ments for computation. After n samples are taken, 
x = ( x l> x 2 ,x ) . A rejection takes place if 


JPL Technical Memorandum 33-482 


27 



(2.41) 


A 


k 


P k dP k (k) 


A j 


where A k and Aj_ Are limits which will be defined. 

PjdF-^x) > A-j^dF^x) (2.*12) 

then it is decided that the sample must have come from 
the class whose distribution is F^(x). The quantity Aj^ 
must satisfy Equation 2.42. Integrating over ip i the 
region of R n , which gives the decision D^, gives 

/ PidPi(x) > A±l P k dF k (x) (2.43) 

^i H 


The left side is the probability of correctly deciding 
class i, whereas the right integral is a-^, the probability 
of deciding i when decision k is correct. 

Thi's is-the traditional presentation. It was pointed 
out to the author that a more careful study must be taken. 
In Chapter III ^ is defined more specifically. 

Yi 1 A i a i (2.44) 


By neglecting the excess over the thresholds 3 


Yi * Aj_a ± 

or an approximation to the threshold Aj_ is 

Aj^ - Y ^/c&i 


(2.45) 


(2.46) 
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Hence the quantities A^_ are chosen as the functions of 
a ' s and y ' s . 

For a discussion on termination of Wald’s method 
see [Ref. 123, Wilks [Ref. 13, pp. ^82-^1973, and Selin 
[Ref. 14, pp. 90-953- In the above references and in 
Appendix 3 the average sample number at termination is 
computed. 

Most techniques using this test take the data in 
a fixed manner. For geometrical data, quite often, the 
measurements are taken using scan-by-predetermined-lines 
or scan-by-matrix-digitization or edge followers. 

Unfortunately in many cases the inputs to the algo- 
rithm do not take the assumed statistical form. Often 
the Gaussian assumptions are made for analytical con- 
venience when little is really known about the inputs . 

e. Extension of Wald's Sequential Probability 
Ratio Test 

Wald and Sobel [Ref. 153 extended the hypothesis 
test to the three-class problem. However, as the title 
of their paper, "A Sequential Decision Procedure for 
Choosing One of Three Hypotheses- concerning the Unknown 
Mean of a Normal Distribution," suggests, the problem 
they solved is related no the normal distribution. 

Barnard [Ref. 1 6 3 and Arrnitage [Ref. 173 have ex- 
tended Wald's sequent! a.l probability ratio test beyond 
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the two class problem In a more general way. Armitage 
is more concise, his work being an outline of Barnard’s 
studies . 

In their studies there are k hypotheses Hp , H2 , . . . ,11^ . 
Applying Wald’s test to each pair of hypotheses, there 
are (l/2)k(k-l) likelihood ratios. 

R _ P 1 dF 1 (x) for all (2. 117) 

PjdFj(x) 

And there are k ratios of the form 

R ±i * 1 (2.48) 

The observations are taken sequentially until all the 
inequalities in one of the k sets are simultaneously 
satisfied. Accept hypothesis i if Ry > Ay for each 
j=l,2,...,k, where is made less than one. Two 
hypotheses cannot be accepted simultaneously when Ay 
are chosen meaningfully. 

This test terminates with probability one if the 
variance of the distribution of Ry- is finite. The 
proof of this is in Barnard and Armitage. 

Rewriting the condition for accepting the ith hy- 
pothesis and neglecting the excess over the boundary 

PyFj^x) = AyPj dRj (x) (2.49) 
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Integrating over the correct decision region of the ith 
hypothesis gives 


A ij P j c ij ~ p i e ii < p i 


(2/50) 


where is the probability of deciding i given Kj , and 

where the right inequality is noted for convenient over 
bounding of . Similar caution as in Equation 2.43 
applies here. Hence 


Ay = P i (2.51) 

is a rule for choosing the boundaries. 

Recalling that is accepted when each R-jj > A-y , 
it is clear that if the ith hypothesis is accepted then 

e ij > e *ij for a11 J*i C2.52) 

where e’-y is the true error rate and e^j is the desired 
error rate. 

The desired false alarm rate is 

a i “ I p i e ij (2.53) 

in 0 J 


An estimate of the actual false alarm rate is 


X p j e 'u = a 'i < a i . 

on 


(2.54) 
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where a* is used to mean the probability of this test 
result being false. 

There are two difficulties with this procedure. The 
first has to do with computational requirements and the 
second concerns a priori knowledge of {ey}. 

After each sample is taken, (l/2)k(k-l) ratios are 
formed. Each ratio is compared to a level. Again, there 
are (l/2)k(k-l) tests. This algorithm requires an amount 
of computation which grows as the square of the number of 
classes grows. For 10 classes, 45 steps are required. 

For 64 classes, 2016 steps are needed for each sample! 
Such requirements proscribe real-time computation. 

The second difficulty with this method, is that often 
not all {ey} are known. Sometimes it is of no concern 
what the individual ey , error rate, is. An example 
illustrates-this point. In character recognition, it 
really does not matter what the probability of misclas- 
sifyi-ng "Q" into "B" is. What matters is, that once 
!, B" is announced, that it be true with high probability. 
Next, when "Q" is given to a machine it. is desired that 
the probability of it announcing "Q" be high. How the 
■misprobability is distributed is immaterial. Again, 
and Yi are the fundamental quantitie of 'pattern recog- 
nition . 
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f. .Reed’s Generalized Sequential Probability 
Ratio Test 

Reed [Ref. 18] proposed a ratio test for multi- 
class problems. But Fu [Ref. 19, p. 176] points out 

i 

that, for more than two classes, it has not been shown 
that the procedure is justified. The only grace of the 
method is that if the number of classes is two, then the 
method coincides with, Wald’s method. 

In Reed's method a ratio. 


U ± (x) = 


PjdF^x) 

k 

n PjdF^Cx) 


(2.55) 


1/k 


i=l,2, . . . ,-k 


is formed at each stage of a sample. The notation 
x = (xp ,X 2 , • . • ,x n ) is used. The stopping boundaries 
are A^, 


A i = 


F j(i e ij) 
Cl 


(2.56) 


l/k 


is compared to Aj_ for every i, and Hj_ is rejected 
if < Aj_ for -any such i. The number k is reduced by 
an appropriate amount and the recomputed. 

Analysis of this test behavior is not available 
except in the two -class problem. 
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5- Geometric Probability . 

Geometric probability [Ref. 20] has to bo with the 
probabilities of certain basic geometric events such as 
a line intersecting a convex figure. First an appro- 
priate measure is given to the basic elements. The 
basic element of measure for a random line s(P,0) is 
assigned in Appendix 4 . A uniform random line s(P,0) 
is described by P and 0 and the probability of such a 
line is proportional to d?d0 . Then the probability of 
these events can be described as the integrals of the 
measure over the event. Many results relate only to 
convex figures. Of course in the real world, figures 
are more often nonconvex. However when the f i gures are 
convex many such probabilities are related in a simple 
manner to the basic features of the figure. If C repre- 
sents the set of all random lines intersecting a convex 
figure with total perimeter L and if dPd0 is the element 

* j 

of measure for a random line then 

J/dPd9 = L(meters) (2.57) 

0 

In order for this to be a probability it must be normal- 
ized, usually by the perimeter of the retina. When one 
assigns zero or one to N, a random variable is formed 

which indicates whether there is an intersection. This 

i 

equation becomes just 
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- L(meters) 


( 2 . 58 ) 


f 1 if s(P,0) intersects C 
N (2.59) 

[ 0 if s(P,0) does not intersect C 

For a complete proof of this see Kendall and Moran 
[Ref. 20 , pp. 58-59]. 

Another interesting result is that if the basic 
element is taken as a point in the plane, and its measure, 
is dxdy then 

JJdxdy = A(meters^) (2.60) 

C ' 

'where A is the area. Again this must be normalized by 
the retina area or by some other constant. Let E, a 
random variable, equal- zero when the point is outside 
of C. It is equal to one when it is inside the figure. 
The above integral becomes 

£P = A(meters^) (2.6l) 

Ball [Ref. 21] uses the moments of such random 
variables to perform classification. Scale invariance 
is obtained by raising such moments to appropriate 
powers, then taking ratios. For instance the feature. 
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//dPd0 J 2 (meters ) 2 


( 2 . 62 ) 


/Jdxdy (meters 2 ) 
C 


is dimensionless. Cor r this feature is size, 

invariant. It is also suggested that moments of functions 
of such basic random events be used as an input to a 
recognition machine. Integrals like 

Jf(s(P,8))dPd0, ( P , 0 ) e fi (2.63) 

ft 


are considered, where s(P,0) is a random line and dpd0 
its measure. 

The moments-*- are estimated with the aid of the weak 
lav/ of large numbers. That is, 
r -> n 

lim Prob-|J^-J( integrand) - £(integrand)| >ej =0 (2.6^4) 


where e > 0. This convergence may be slow causing errors 
in the estimate of the moment. When these numbers are 
raised to powers, so that the dimensions will cancel, 
uncertainties become greater . The conclusion is that 


-*-Ball does not use the random variables di- 
rectly. He uses the terra integral geometry. Moments 
are integrals . 

2 The relative maximum absolute error of a 
product is the sum of the relative maximum absolute _ 
errors of each factor. Hence the powers of uncertain 
numbers become more uncertain. 
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obtaining the moments by random samples is unsatisfac- 
tory. Other methods similar to quadrature in numeric 
integration are proposed. 

B. EXPERIMENTS IN PATTERN RECOGNITION 

These experiments are presented here to illustrate 
the measuring and classifying techniques that are often 
used. 

1. Random Features 

One of the earliest suggestions for the use of ran- 
dom lines in pattern recognition was made by Rubinstein 
[Ref. 22]. He used the average number of intersections 
that a random line makes with open angular figures to 
attempt the recognition of the type of intersection. 

Ball [Ref. 21] introduces geometric probability 
to give the above method a firm foundation. Yet he too 
uses only the estimate of the various means as- the Input 
classification. On the other hand Wong uses the dis- 
tribution of the random variables. 

Wong [Ref. 23 , pp. 335-5^6] uses random lines thrown 
against geometric shapes such as squares , circles and 
polygons to find the total length of intersection .- This 
feature is used in Wald's sequential probability ratio 
test. He considers shapes that are similar in convex 
area. A set of fifteen simple basic figures are used. 


JPL Technical Memorandum 33-482 


37 



Results of five pair-wise tests are reported. 

The first experiment reported in Chapter V is an 
extension of this work. The figures considered were 
complex — block letters H and U. 

The second experiment in Chapter V is a problem in 
multiclass classification. Pair-wise tests are avoided. 
All classes are considered at once using the algorithm 
of Chapter III. Furthermore the feature used in the 
experiment is the number of intersections a line makes 
with a pattern. The complete distribution of this fea- 
ture is used. 

2. Handwritten Character Classification 

Demonstrations have shown that even humans perform 
rather poorly in recognizing handwritten characters out 
of context. The general problem is very difficult. Work 
has been done by Brain and Hart [Ref. 24] on special 
types of handwritten characters — printing in confined 
squares as on FORTRAN coding sheets [Fig. 2-6], Their 
feature extraction is in two steps. First each character 
is quantized into a 24 x 24 matrix. The matrix is com- 
pared with 84 edges in 9 translated positions.. The re- 
sults of this edge detection are fed into a trainable 
linear classifier^. The training set consisted of 8000 

-'Trainable linear classifiers are discussed 
.in Nilsson [Ref. 5* p. 79]. 
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Fig. 2-6. Edge Detection of Geometric Figures 
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characters from 10 writers . The reported error rate on 
material obtained frofo writers not included in the train- 
ing set was about 20$. This is a rather poor result 
considering the number of computations — 8*1 x 9 ~ 756 
edge detections! 

Segment analysis has been quite successful in pat- 
tern recognition of handwritten letters. Mori, et al. 
[Ref. 25] as well as Sheinberg [Ref. 26] have used ele- 
mentary strokes as input to a linguistic pattern recog- 
nizer. Some of the elements used are pictured in Figure 
2-7. Both groups of investigators have been successful 
and have machines on the market. Their operating char- 
acteristics are not quoted here due to the unknown ex- 
perimental standards. These characteristics are sen- 
sitive to the source of the experimental data. 

Fu [Ref. 19, p. 36] reports a handwritten character 
classification system that makes 7% error. The number 
of computations that it needs is far less than required 
above . 



Fig. 2-8. Preselected Line Intersection 
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The feature which Is extracted is the length that 
the predetermined lines 1,2,... ,8 make with the figure 
[Fig. 2-8]. Therefore the measurement is Xi,X 2 , . . • . 

To get the test it is assumed that Xi>X 2 ,.**,Xg are 
Gaussian, with a mean and variance depending on the 
figure. The sequential probability ratio test [Ref. 12] 
is employed. On a set of two characters, A and E, de- 
cisions are made on the average after six measurements. 

On a set of four characters, a b c d, a similar experi- 
ment shows that the average number of measurements needed 
to make decisions with 7 % error rates is less than ten. 
[Ref. 19, p. 393- 

It shall be noted that this system is sensitive to 
alignment and the algorithm falls apart if the centering 
of the figure exceeds a tolerance. The foundation for 
the Gaussian assumption is weak and the prediction of 
the probability of error depends on this important 
assumption . 

3. Machine Produced Impact Printing 

A method widely used for feature extraction of 
machine produced impact printing is to scan the page in 
a zig-zag pattern [Fig. 2-9] or by a group of parallel 
lines [Fig. 2-10]. 
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Fig. 2-10. Parallel Line 
Scan 


The continuous data stream is matched-f iltered 
(usually digitally) with patterns already stored in a 
machine. The most sophisticated machine on the market 
is the IBM 1975 Optical Page Reader [Ref. 273 s which 
operates in the one-error-per-million region. This 
performance is attained by checking for context when it 
is "uncertain" about a character's classification. Also 
it has a set of stored patterns for each font. 

The IBM 1975 Optical Page Reader is operational and 
is used by the Social Security ■ Administration to digitize 
quarterly employers' reports . It checks context by 
looking through a dictionary of names. The IBM 1975 
is a specialized machine which is prohibitively expensive 
for most applications. 

Other similar experiments have used matrix digi- 
tization or edge followers to quantize the visual data. 
The matrix scan or the scan-by-predetermined-lines tech- 
niques are subject to alignment constraints. Character 
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registration also plays an important part in the machine's 
performance. Skewed or smudged characters play a critical 
role. It would be desirable if one could find a feature 
extraction that is invariant to displacement > rotation, 
and size. 
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CHAPTER III 


PROPOSED SEQUENTIAL MULTICLASS HYPOTHESIS TEST 

In this chapter a sequential hypothesis test Is 
proposed. It is one solution to the multiclass classi- 
fication problem. Its attributes are: 

a) Its performance (in terms of the error rate 
of the first kind and the probability of de- 
tection) can be controlled. 

b) At each step it is a Bayes test with a special 
structure for the loss function. 

c) It terminates almost surely. 

d) For the two-class problem it is a Wald's test. 

e) The average sample number required for a test 
can-be estimated. 

The notation used throughout this chapter is defined 
in Chapter I, Section C. In the subsequent sections each 
of the properties listed above is derived or proven. 

The proposed sequential multiclass test will have 
two forms well suited for rapid computation. But before 
the algorithm is presented a few words concerning the 
motivations for forming such a test will be given. Also 
some Interpretations of the algorithm will be given. 
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•It was pointed out earlier that in pattern recog- 
nition. the important parameters are the error rates of 
the first and second kinds, and Bp, respectively. 

That is, if a machine announces a decision, for instance 
- that a letter on a- page is Q, the user wishes to have an 
estimate on the probability of that declaration being 
incorrect. In many applications it matters not what 
letter is really there, if indeed an error has been made. 
Hence, it may be reasonable to formulate the decision 
algorithm on the basis of the likelihood of an observation 
compared to some acceptable threshold. If there are M 
classes, M successive comparisons may be made. In other 
words, for each possible decision one can' ask whether 
the likelihood of the observation coming from a class 
is sufficiently greater than the .average likelihood of 
the observation coming from any other class. More pre- 
cisely, decide Dp if 

I p j dF (j ( v i,v 2 ,... > v n) 

^—3 < Cj. (3.D 

P i dP 1 (v 1 ,v 2 , . . . ,v n ) 

otherwise take more samples. 

It turns out that this test is in a very convenient 
form for the computation of ctj and Bp as will be shown 
in subsequent sections. 

Clearly Equation 3-1 looks familiar. It is shown 
JPL, Technical Memorandum 33-482 



that this equation collapses into Wald's test for the 
two-class problem. Also, this form of the test looks 
like the generalized M-hypothesis test in Van Trees 
[Ref. 9, p. ^8], But unlike the M-hypothesis test for 
which the error rates are difficult to compute, as stated 
by Van Trees, the error rates for the proposed test are 
simple to compute. 

The proposed sequential multiclass hypothesis test 
will have two forms. The observations Vj_ , V 2 , . . . , v n _i 
have been observed and the test has not yet terminated. 
That is, the decision so far is D 0 . The proposed algo- 
rithm, first form, "is as follows: 

1) Take the nth sample v n and let v = (vjVv^, . . . , v n ) 

2) Take another observation (declare D 0 , n n+1, 
go to Step 1), if for every i = 1,2,..-. ,M 

l P.dlMv) > CjPidFiCv) (3.2) 

j^i J J 

3) The test terminates (go to Step 4) if for 
any i = 1 , 2 , . . . ,M 

I PjdF.tv) < C i P i dF 1 (v) (3-3) 

4) Choose to verify Hj_ which- minimizes 

l P<dFj (v) - C 1 P i dF i (v) (3.*0 

.iri. 
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Terminate . 


5) Declare . 

Step 4 above is intuitively appealing. It will be 
shown that the ratio of the false alarm rate and the 
probability of detection Yi is less than the thresholds 
used in the test. 

C, > 

Yi (3.5) 

Some manipulation will show that Step 4 directs the 
algorithm to choose a class in the quickest way while 
not overstepping that requirement. 

In Appendix 1 an equivalent form of the test is 
derived. The second form Is more convenient for com- 
putation. The proposed algorithm, second form, is as 
follows : 

1) Take the nth observation v n and let v 

■ 3 (v 1 # v 2 ,.. .,V n ) . 

2) If 

M 

l P-jdP-Cv) > max{(C i +l)P i dF i (v)} (3.6) 

j-1 i 

take another sample (declare Dq, n *■ n+1, go 
to Step 1) . 

3) Otherwise choose to verify so that 

(C 1 +l)P i dF i (v) = max{(C 4 j+l)Pj(JPj(v)> (3-7) 

j J 
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A. DETERMINATION OF {ctj } AND { y ± 3 FOR A- TEST WITH 
CONSTANTS (C^ 

Here it is supposed that a test as described in 
Equations 3.2 to 3.7 has the constants {C^} fixed at 
some known numbers . 

When a test termj nates at the nth stage with D i3 


jVi 


P^.dFjCv) 


< C^dFiCv) 


(3.8) 


Neglecting the excess over the boundary allows Inequality 
3.8 to be written as an approximation. 


l PidF-j (v) = CiPidFiCv) (3.9) 

j*L J 


Now integrating over the region of v 3 . such that 

the test terminates at stage n with gives 




(n) 


oVi 


Pj dP j< v) 




(3.10) 


which is the same as 


-L p j/ (n) d V v > =* c i p d 


(n) dF i <v) . 


(3.1D 


.But b.y Equation 1.15, the definition of ej_j ^ n - ) , the 
above becomes 



c iPie 


(n) 


(3-12) 
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Using the error probabilities when the test stops at the 
nth stage, e^j ( n ), the total error probabilities are 


I 

n 


i W n) ■ x 


j^i 


C* p • e • • 
^i r l e :Ll 


(3.13) 


n 


Interchange of summations on the left gives 

I Pj l Hj (n) * CiPi l HL (n) (3.14) 
0^1 n . n . 

Using the definition of the error probabilities ej_ j , 
Equation 1.17, this can be rewritten as 


I P j e ij * c i p i e ii 
jri 


(3.15) 


Making use of the definitions of the. error of the first 
kind a-j_ and the detection probability Equation 3-15 
can be written as 


a l c iYi (3-16) 

The reasoning used so far gives an approximation of C^. 

C I ~ — (3.17) 

Yi • 

One further condition is made to obtain still a simpler 
estimate of the threshold C^. If the detection rate is 
high, it is approximately equal to the a priori proba- 
bility of that class. 
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Yi “ 


(3.18) 


This suggests the following important point. If 
one desires a recognition system with false declaration 
probabilities {a^} then one chooses 



In Appendix 2 the Chernoff Bound is used to show that 
Yi 4 as n + ®. This will establish more firmly the 
value of this approximation method. 

This is a flexible algorithm. Output error rates 
are under the control of the user. 

Extensive experimental results reported in Chapter V 
verify the usefulness of Approximation 3-19- 

B. RELATION-TO THE BAYES TEST 

This section shows how the proposed test relates to 

* / 

the Bayes test for any fixed- number of samples n. It is 
shown first that the reject region is the same as a Bayes 
test with particular loss functions. Next it is shown 
that the decision regions are the same. In fact, the 
proposed test is a Bayes test with a special cost struc- 
ture which permits rapid computation of the error rates. 
This last computation is not generally feasible for the 
Bayes test. But- as shown in Section A of this chapter. 
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the error rates for the proposed test can be readily 
computed and controlled. 

The Bayes rule minimizes the average risk. In 
.Chapter II, Section A It was shown that Dj_ is chosen . 
to minimize 


M 


I LijPjdFjCv) 


( 3 . 20 ) 


1=1 


over all i = 0,1,..., M 

Suppose that the Bayes rule rejects all hypotheses 

< 

after taking n samples. Then for every k = 1,2,...,M, 


M M 

l L 0 iP,dP,(v) < l LfcjPjdP^v) 

-j o o u • -I u u u 

J 


( 3 . 21 ) 


whi ch can be rewritten as (for a particular k=i) 


M 


I (L ir L 0 j)PjdF.j(x) > (L 0i -L il )P 1 dP i (v) (3*22) 
j^i 


For the proposed test a rejection occurs at the nth stage 
whenever, for every i = 1,2,.,., M, 


M 

l PjdF i (v) > CjPjdlMv) 

3*1 


( 3 . 23 ) 


Clearly the proposed test and' the Bayes rule have 
the same rejection criteria if,, for all k = '1,2,...,M, 


JPL Technical Memorandum 33-482 


51 



(3.2*0 


C 


i 


L 01 ~ L ii 
L ik " L 0k 


k?^i 


.The. division implies that (L^ k - Lq^) ¥■ 0. 

A solution to these equations can be obtained by 
observation. They are, for i^j^O, 


Lqi - 0 



Clearly these loss functions satisfy 


L 0i ~ L ii c i 



(3.25) 


( 3 - 26 ) 


and 

(L lk L 0k ) = 1 for all k^i (3-2?) 

These assignments also satisfy one’s intuition. If there 
is a reject at the nth stage there is no loss because 
another sample is taken and the test is continued. How- 
ever, if there is a misclassification, then a penalty of 
one is assigned. On the other hand, when the correct 
decision is made, a reward is given. If the permitted 
error type of the firse kind ct^ is large, so is . 
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Thus, when a correct answer is given under conditions 
allowing more errors, the reward is also larger. 

Now suppose .that a Bayes rule makes decision Dj_, 
i^O, after n observations. Then, 

M . M 

L i3 PjdFj(v> < l LyPjdFjtY) (3.28) 

for every k = 1,2,...,M. 

Substitution for the loss functions yields, 

l PjdPjCv) - CiP^FiCv) 

Jri 

< l PjdFj ( v) - C k P k dF k (v) (3.29) 

j j^k 

Eliminating the common terms from both sides gives 

^(Ci+DPidFiCv) < -(C k +l)P k dF k (v) (3. 30) 

Hence the Bayes rule is 

max{ (C^-fDP^dFiCv) } (3. 3D 

i^O 

It can be seen that the proposed test. Equation 3.7, 
is indeed a special form of a Bayes rule. 

C . TERMINATION 

It will be shown that the proposed test terminates 
almost surely. Doob [Ref. 28, p. 3^9] shows the 
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convergence of the probability ratio 


dF]t ( v ) 
dPj,(v) 


a.s . 


(3.32) 


given Hi. This fact will be used below. 

Nov? suppose that some hypothesis Hi is given. Recall 
from Equation 3-3 that the test terminates whenever 


Ci 


I fjdPUv) 




PidB’i(v) 


(3.33) 


That is whenever the ratio is less than Ci the test ends. 
It will be shown that the ratio in Equation 3.33 will 
approach zero with probability one. The right side of 
Equation 3.33 can be written as 


l 


P jd.Pj(v) 


OVi PidPi(v) 


p i 


(3.3*0 


where by Definition 1.6 




dF i .(v l 3 V2 J :.. 3 v n ) 
dFi(Vi,v 2 , • • • ,v n ) 


(3.35) 


But by Equation 3.32 

Z n (i,j) ->-0 a.s. for all jVi 
Hence the test terminates almost surely. 
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In any practical application of a sequential test, 
one must consider a large number such that when the sample 
number reaches this number the test is arbitrarily termi- 
nated, Because the error rates go to zero as n becomes 
large, this safeguard should not affect the performance 
significantly. In the tests reported in Chapter V the 
arbitrary termination point is placed at about ten times 
the average sample number. In more than five thousand 
cases which are run, not one test has a sample number 

this large . 

< 

D. RELATION TO WALD'S TEST 

The proposed test for the two-class problem Is -clearly 
Wald's test. The proposed test takes another sample if 

P 2 dF 2 (v) > CpPpdPxU) (3.36) 

and 

Pid^Cv) > C 2 P 2 dP 2 (v) (3'. 37) 

But rewriting these equations we get 

P 2 dF 2 (v) i 

0 1 < < — (3.38) 

P^dF^Cv) 

This shows that the proposed test takes another sample 
if the ratio of probability has not crossed either of 
the tvio boundaries. 
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Equation 3.38 is the same as Equation 2. *11 with the 
lower boundary 

A k = C-L (3.39) 

and the upper boundary 

• A ± = Cg" 1 (3.40) 

E. APPROXIMATION OF THE AVERAGE SAMPLE NUMBER . 

In Section C the termination of the proposed test 
with probability one was shown. This section addresses 
the problem of computing an approximation of the average 
number of samples at termination under 

Two assumptions will be made to assist in this 
approximation. First, assume that some one probability 
distribution causes the delay of a decision or the 
incorrect classification. Second, assume that when any 
decision is made m = 1,2,...,M, 

T PjdPjU) » C m P m dP m (v) (3- HD 

jri 

This is sometimes known as neglecting the excess over 
.the threshold. It is similar to assuming that each step 
toward a goal is small and that the goal' is far. Hence, 
when the goal is crossed, the position is near the goal. 
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With these assumptions, Equation 3-3 becomes 


l P,dP,(v) « ChP-idP, (v) 

in J 3 


when the correct decision is made, and 


I P 4 dP 4 ( v ) a c k P k dF k (v) 

in J J 


when an incorrect decision is made. 

By invoking the first assumption that F k causes 
the delay, the sum is approximated by one term. Hence, 


dP k (v) Pi 

~ Cl — 

dFi(v) P k 


when the correct decision is made, and 

<JF k (v) Pl , , 

— « ( 3 . 

dFi(v) C k P k 

when an incorrect decision is made. 

N is the sample number at termination. Taking the 
logarithm of the ratio of probabilities at termination 
and denoting it Z N (l,k) gives 


dF k ( v ) CiPi 

Z N (i,k) * In “ ' ~ In — 

1 dF H (v) P k 


when the correct decision is made, and 
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Z w (i,k) « In 


dP k (v) 


- In 


dFi(v) C k P k 


(3.^7) 


when the incorrect decision is made. 

> 

Under the ith hypothesis, decision is made with 
probability 

Prob{D i jH i true} = e^ (3.^ 


The probability of an error is 


Prob{D k j Hi true} = 1 - e i± 


Hence, the conditional expectation of the logarithm of 

the ratio of the probabilities at termination t (Z N (i,k)) 

Hi 

can be computed. 


c i p i p i 

l (Z N (i,k)) « e ±1 In —— + (l-e^Jln — * (3-50) 

H i u k p k 


But from Appendix 3, 


l (Z N (i,k)) 
H i 


£ (z(i,k).) £ (N) 
H i H i 


(3.51) 


Therefore, the average sample number given is approxi- 


C j Pi Pi 

e ln —— + (l- eii )ln 
£■ (N) as p k °k p k 

H 

1 £ (s(i,k) ) 


(3.52) 
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In some applications a portion of the observations 
Is not used. In particular, when the number of inter- 
sections that a random line makes with- a pattern is used 
as a feature, neglecting those observations with zero 
intersections and using the conditional probability 
distributions provide for size invariance. This Is 
more carefully presented in the following chapters. 

The estimate of the average sample number. Equation 
3-52, is an approximation of. the average number of sam- 
ples used per test. To obtain the average number of 
samples observed one must' modify Equation 3.52 to reflect 
the fraction of the observations which are neglected. 

Extensive experiments reported in Chapter- V show 
that this estimate is a good one.' 
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CHAPTER IV 


INVARIANT FEATURE EXTRACTION 

A. HEURISTIC DISCUSSION 

In Chapter II some methods of taking data were 
mentioned. Many of these methods are sensitive to the 
location and orientation of the object under study. In 
this chapter methods which are not dependent on these 
factors are developed. The cost of aligning the patterns 
to the transducer motivates such a development. 

In hand-printed material, the reasons for variation 
in location and orientation are clear. There are simi- 
lar reasons why impact printed characters also have 
alignment irregularities. For example in high speed 
computer printouts the characters may be misplaced hori- 
zontally or vertically. This is due to variations in 
the way the hammer strikes the moving letter form. In 
aerial photographic reconnaissance, the significance of 
the data may be unrelated to either the precise location 
or the orientation of the object being sought. If one 
is scanning a picture for airfields, its recognition 
should not be affected by its whereabouts. 

There is a need for an observation scheme which is 


60 


JPL Technical Memorandum 33-482 



invariant to certain features . Precise location and 
orientation are two aspects of geometric subjects which 
do not contribute to their classification. A square and 
a rectangle ought to be different no matter where they ' 
are in the field of view. Another feature which is often 
unimportant from one class to another is size . 

This is not to say that location, orientation, and 
size are unimportant. Often these features give dif- 
ferent meanings to the same symbol. Arrows are good 
examples, and having opposite meanings. Also 

observe how these same^ shapes, b d p q, are used to 
represent quite different things. The search for an 
invariant feature extraction method which will classify 
these letters in the same class is still fruitful. There 
are other methods which can subclassify them. 

B . INVARIANCE 

Invariance of decision rules is discussed in detail 
in Ferguson [Ref. 30, p. 14 4 ] . It is defined for a group‘d 
of measurable^ transformations over the space upon which 
decision theory is founded.- Here the concern is over 

^•These shapes are. rotational and. mirror-: mage 
transformations of one another. 

2por the exact definition of group see Birkoff 
[Ref-. 31, p. 117]'. 

3lt must be measurable to assure that a random 
variable X is transformed into a random variable g(x). 

See Breiman [Ref. 29, p. 1-06], 
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the invariance of the observations when the trans- 
formations are applied to the patterns in the retina. 

The decision algprithm is constant. 

What features are insensitive to particular trans- 
formations? How can observations be taken so that they 
are independent of the transformations? It is shown in 
Section D of this chapter that randomizing answers these 
questions . 

C . FEATURES 

Any aspect or quantity derived from a pattern is a 
feature . The word feature is used to mean a scalar, 
vector or matrix quantity. The area, the perimeter, the 
convex hull perimeter and the "convexity” of a geometric 
shape are four examples of features. The gray level of 
a matrix scan is an example of a widely used feature. 

It is tempting to try to measure the usefulness of 
a feature. However It is quite difficult to -assign a 
numerical quantity to the usefulness of a feature. In- 
vestigators have used the entropy 1 of features as a 
measure. Others have used variance. It is unclear how 
either of these quantities relate to the fundamental 
quantities of pattern recognition (error and reject 
rates). Since the object of pattern recognition is to 

^For a definition see Ash [Ref. 32, p. 2*1 3 . 
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give the best classification as quickly as possible, a 
feature is best if a decision algorithm requires the 
least amount of such features. 

A good feature is one which requires a minimum num- 
ber of samples for a given decision algorithm operating 
at certain error and reject rates, and which can be ex- 
tracted with a minimum of effort. 

D. RANDOM EXTRACTION 

One way to take data is by cross-correlating the 
pattern with certain reference elements. These reference 
elements may be the most primary elements of geometry 
or they may be as complex as the prototypes of the pat- 
terns. Here the basic elements of geometry are chosen 
as the reference elements due to the ease with which 
they can be_generated . They are taken to be appropri- 
ately distributed within the retina. The reference 
elements are taken at random . This is done so that the 
features will no longer be dependent upon the location 
and orientation. 

As an illustration of feature extraction ('see Figure 4-1) let the 
reference set be the points in the retina. The elements are chosen 
one by one, at random and uniformly. These are correlated with 

i 

the pattern. The result is a random variable, 
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X(x 3 y) 


1 if '(x,y) e (Pattern Interior) 
0 if (x 3 y) e (Pattern Exterior) 


(4.1) 


Clearly X is a feature of the pattern. It is independent 
of where the pattern. is in the retina. Also functions of 
X are features of the pattern. 

Observe that the mean of X is proportional to the 
area of the pattern. 


Area Pattern 
Area Retina 


(X) 


(4.2) 


Unfortunately in most geometric pattern recognition 
problems this data is insufficient for classification 
because many different shapes may have the same area. 



Fig. 4-1. Random Points Used as a 
Feature Extraction 
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Next consider the set of all lines intersecting 
the retina. These lines may be parameterized by the 
polar coordinates (P,0) of the point closest to the 
origin. Random lines uniformly distributed over the 
retina may be obtained by choosing P and 0 uniformly in 
0 < P £ R, 0 < 0 < 2ir, respectively, where 'R is the 
retina radius. (See Appendix 4 . ) 

Features of the pattern may be obtained by observ- 
ing the cross-correlation of such lines with the pattern. 
Let X be equal to the total length of the intersection 
of the line with the pattern. It is clear that X is 
independent of where the pattern is situated. X is a 
random variable which depends only on the pattern itself. 
The properties ' of this random variable and other random 
variables derived from random lines that intersect the 
pattern are discussed fully in the next section. 

Other geometric elements can be used as the reference set 
(Figure 4-2). However, when the elements become complex, 
the process of making their measure not dependent on 
orientation and location also becomes complex. Random 
ellipses can be used as a basis for feature extraction. 
Random variables can be defined in terms of the length 
of intersection, number of intersections, etc. Circles, 
lines, and points are degenerate forms of ellipses. It 
is questionable whether these random variables can be 
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Fig. *(-3. Uniform Random Lines in a Retina 
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made independent of the location and orientation of the 
figures. Also the computational disadvantages of using 
higher order elements limit the scope of this dissertation 
,to the use of random lines . 

E. PATTERNS AND RANDOM LINES 

Random lines have been defined earlier. (Also see 
Appendix *1.) Figure *1-3 shows a field of random lines. 
Patterns have not been given a formal definition and it 
will be defined only implicitly by examples. In Figure 
4-4 a pattern is represented by n rectangles Ri>R 2> • • • 
These n rectangles jointly form a pattern. They are 
disjointed so that the concept of a line intersecting 
a pattern can be clearly illustrated. The two segments ^ 

Wq and v/ n+ -]_ , are dependent upon the position and orien- 
tation of the pattern. The Wj_ represents the intersection 
of the line with the ith region of the pattern. There 
are many functions that can be formed from -the w ^ . A few 
of them are : 

a) a multivariate random variable 

W = ( Wl ,w 2j ...,w n ) (4.3) 

b) the largest intersection segment 

U = maxtwj ,W 2 , . . . ,w n ) • (4.4) 
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c) the smallest non-zero segment 

V « min(w lj v/ 2 j * . . s w n ) (4.5) 

d) the number of intersections 

n 

N = l sg nCwj^) (4.6) 

i=l 

where sgn(') is the sign function, +1 when the 
argument is positive, -1 when negative 

e) the total length of intersection 


X 


n 

I 

i-1 


V/H 


f) the joint random variable 

Z = ( N , X ) 


* (4.7) 


(4.8) 


The multivariate random variable V/ has all the 
information contained in the other random variables.. 

But the computational requirements to estimate, store, 
and use multivariate random variables are severe. Hence 
for these reasons and not- on the basis of theoretics, the 
multivariate random variables are no longer considered. 

The random variable U swamps the small contributions 
of the lesser Wj_ . Yet it may be these small quantities 
which make the pattern different. V is formed by the 
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smallest w^, and it is susceptible to noise. These. two 
random variables will no longer be considered. 

The number of intersections N has interesting 
properties. It was pointed out that its average was 
related to the perimeter if the pattern is convex. (See 
Chapter II . ) 

6N = Perimeter (*1.9) 

For nonconvex figures £N can be used as an indicator of 
its convexity 1 . 

Perimeter of convex hull = / dPd6 

Pattern 

< Perimeter = / NdPd0 = £N ( f l .10) 

' Pattern 

The proba bili ty density function of N is more interesting. 
Clearly the probability of intersection is determined 
by the convex hull of the pattern. For a convex pattern 
there is one intersection. But an arbitrary shape has a 
probability density function which depends on the figure. 
As will be illustrated in Chapter V, this feature is 
useful when the figure is narrow or when the width of a 
figure is of no consequence. 

Figure ^-5 displays the probability density function 

1 See Bali [Ref. 21, p. 38 ]. 
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1 2 . 3 
Number of Intersections 

Fig. *1-5. Probability of Number of Intersections of 
Random Lines against Block H and U 
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Pig. 4-6. Detail View of Random Lines Intersecting 
the Letter H 
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Pig. 4-7. Detail View of Random Lines Intersecting 
the Letter U 
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of N. Random lines are thrown against block letters. 

The total length of intersection, random variable X, 
is also a useful quantity because this scalar number is 
easy to extract. A movable spot scanner can be used, for 
instance. Figure if —6 shows how random lines may inter- 
sect a block H and Figure *1-7 illustrates random lines 
intersecting a block U. The outlines of the block letters 
were omitted to stress the point that not many lines are 
needed for a person to decide what the pattern is. X 
seems to be promising as an input to the decision algo- 
rithm of Chapter III. It will be shown in the next chap- 
ter that indeed quick decisions can be obtained ‘by using 
X as an input . 

Z is the Joint random variable. Its two components 
are X and N. One may need to use Z when either N or X 
alone produces unsatisfactory results due to noise, font, 
or style changes. 

F. NOISE, SIZE, FONT, AND STYLE 

A few factors which affect the random variables N 
and X are noise, size, font, and style. Noise refers 
to smudges, distortions., or- breaks in the pattern due to 
the printing, the paper, or the photographic process. 

Also the texture of the background is considered noise. 
Font refers to the various printing faces. There are so 
many fonts that even the best reading machines available 
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today can handle only a small percentage of them. The 
problems involved in reading handwritten material are 
obvious . 

Noise affects N more than X. This is due to the 
fact that X is an "integral" of • the overlap. A small 
extraneous blob affects X only slightly. On the other 
hand, whenever a random line intersects such a blob, N 
is made to differ by one, which is a significant change. 
In most character patterns N is most likely to be less 
than four. 

Size does not affect the conditional probability 
density function of N, for N not equal to zero. Size 
changes X proportionally. However changes in size can 
be dealt with if those changes occur "slowly" or ""in- 
frequently " by putting X through an automatic gain 
control. 

How font changes affect X and N is a question that 
can be answered experimentally. The variations in the 
fonts are subtle and cannot be handled analytically. 

All the questions associated with font and style 
are complex. Further experiments are needed to find 
cross-font invariant features. X and N seem to be good 
random features. 
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CHAPTER V 


EXPERIMENTAL RESULTS 

In Chapter III a multiclass hypothesis test which 
performs at the desired error rates is developed. It 
allows rejects to occur. Upon a reject, the test con- 
tinues by adjoining an additional sample to the obser- 
vation. It Is. shown that this test is Bayes. In the 
case where there are only two classes this test is the 
same as V/ald's sequential probability ratio test. 

Feature extraction is discussed in Chapter IV. Two 
random variable features X and N which are invariant to 
translation and rotation are found. They meet with the 
needs of the multiclass hypothesis test. {X^} are in- 
dependent and may be obtained quickly . The same comments 
hold true for { } . For a given figure in a retina, the 
'sequence of {Nj_} or {X^} is virtually limitless. 

Two experiments are described in this chapter. The 
first experiment has to do with block letters, and the 
random variable X, the total length of a random line 
Intersection. In the second experiment, handwritten 
numerals are classified using N, the number of inter- 
sections a random variable makes with the numerals. 
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.A. CLASSIFICATION OF BLOCK LETTERS 

Block letters -are used in this experiment. Their 
shapes are illustrated In Figures 4-6 and 4-7 • Two 
similar letters were chosen. These two letters have 
equal areas and the same convex hull areas . They differ 
•in 22% of the area. Certainly if such letters can be 
classified, there is hope for more differing letters. 

Random observations X are made. X is the total 
length of .intersection that a uniformly distributed 
random line makes .with a block letter. 'The random lines 
are taken one at a time Independent- of each other. Ap- 
pendix. 4 describes the. theory of- choosing uniform inde- 
pendent random lines. Hence, {X.^} are. clearly Indepen- 
dent . .... 

In the proposed test . the. probability distribution 
functions of~X, given each letter, are prerequisites. 
Hence the first step is to learn these distributions. 

This is done empirically because the mathematics avail- 
able today (such as geometric probability) allow the 
direct computation of only a few of the simple moments 
of the random variables. 

Experimentally it is noticed that the p.d.f. changes 
hardly at .all after 5,000 samples are tabulated. The 
p.d.f.’s used in this experiment are estimated by 50,000 
samples of X. 
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Figure 5-1 illustrates the empirical p.d.f. The 
peaks at 50 , 100, 150, and 200 units correspond to the 
dimensions of the block letter H. The peak at 140 for 
the letter U is due to the horizontal area. For the 
letter U it is on the bottom and for the letter H it is 
in the middle of the letter. The p.d.f.'s at zero are 
omitted from the diagram because they are the same for 
all convex hulls of the same perimeter. In fact, the 
p.d.f.’s are proportional to the ratio of the convex 
hull perimeter and the perimeter of the retina. See 
Kendall and Moran [Ref. 20, p. 58]. 

The difference between the two conditional random 
variables is more apparent in Figure 5-2 where the 
cumulative distribution functions are displayed. 

The average number of samples needed to come to a 
decision is a function of the error probability which 
one desires (Chapter III)-. Figure 5-3 displays the aver- 
age sample number as a function of the significance of 
the test. Figure 5-4' shows four decades of this relation- 
ship. The errors. 


e 12 = e 21 ( 5 . 1 ) 

are held constant with respect to each other. 

The samples X = (xj,X 2 , . . . 3 x n ) are independent. 

This makes the computation of P^dF^x) extremely simple. 
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Probability 







0 0.025 0.050 0.075 

Error Rate 

Pig. 5-3. Average Sample Number at Termination 
for the Block Letters H and U 
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Error Rate 


Fig. 5- 


4 . Average Sample Number at Termination 
for Block Letters H and U with 
Experimental Points 
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When n samples are used. 


P 1 dF i (x) 




(5.2) 


The evaluation of this function, after another observation 
becomes 


PidFi(x) 


n+1 

P ± n dFiCx- 

j=l 


(5.3) 


n 

** dFi(x n+ i)Pi n dFi(x 




C5.4) 


The probability ratios are tested against thresholds 
as indicated by Equations 3*38 to 3*^0. 


a 2 


PjdPitx) 1 
Cp < “ < = Aq 

P 2 dF 2 (x) Ci 


(5.5) 


If the boundaries are exceeded a decision 'is made, 
whereas if neither boundary is crossed, further samples 
are taken. 

In the following tests the classes are equally 
likely. 


Pi =' 1/2 

(5.6) 

P 2 = 1/2 

(5.7) 


i 

In these experiments, no rejects are allowed after 
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200 samples. They are then classified using the maximum 
likelihood test. This is done to limit the computing 
time . 

. Tv/o runs of experiments are made. In the first run 
the algorithm is requested to make a 1 % test, e 2 q - ej 2 
= 0.01. Point A of Figure 5~*1 is obtained from 7600 
tests . The average number of samples used for a test 
is 7 6 . The point is shifted to the right from the 
px*edictea position. This is due, in part, to the arbi- 
trary truncation of the test at 200 samples. 

In the second run the algorithm is requested to 
make a 10$ test.' Its actual performance is Point B of 
Figure 5-^. It requires 3^ samples on the average. It 
uses more samples than predicted, but the decision is 
better. 

It is interesting to observe the behavior of the 
likelihood ratio. It is a random walk biased upwards by 

£ !^!i (5.8 

P 2 dF 2 

The logarithm of the ratio is displayed in Figure 5-5 
along with the logarithm of the limits and A 2 . Four 
tests are detailed, step by step. Tests 1 and 2 termi- 
nate' well below the expected average sample number and 
Tests 3 and *1 terminate above it. 
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In (Probability Ratio) 


2.19 


0 


-2.19 



0 20 40 60 80 100 12 

Step Number 


Pig. 5-5- Random Walk of the Logarithm 
of the Probability Ratio 
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d go 







Figure 5-6 displays the actual lines which are. used 
in Test 2 of Figure 5-5. Lines 1 and 2 of Figure 5-6, 
Parts a and b are different for each letter, whereas 
Line 3 is the same for each figure. Lines 1 and 2 give 
information about H and U but Line 3 does not . 

B . HANDWRITTEN NUMERALS 

Four digits illustrated in Figure 5-7 are used to 
learn the probability density functions of N, the number 
of times a random line intersects a .given figure. These 
p.d.f.’s are learned by using many random lines. Figure 
5-8 illustrates their nature. Figure 5-9 shows the 
cumulative distribution functions for the numbers 2, 3 
*5, and 5- They are superimposed on one drawing so that 

i 

the differences will be apparent. For these experiments 
20,000 lines are used to estimate each p.d.f. 

The Prob{N=0} is used to normalize the p.d.f. This 
is t.he same as normalizing by size since the probability 
of intersection is directly proportional to the convex 
hull of the character. 

Using the formulas which are developed in Chapter 
III the "average number of samples needed can be computed 
for any given error rate. Figure 5-9 displays such a 
relationship for = a ^ . 

Tests are run for Points A and B of Figure 5-10. 
Point A is due to 732 tests for the class. The {o^} 
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Fig. 5-7. Handwritten Numerals Used 
in the Experiment 
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2 3 


Number of Crossings 



12 3 ^ 

Number of Crossings- 



1 2 3.4 

Number of Crossings 



12 3 ^ 

Number of Crossings 


Fig. 5-8. Probability of n Crossings Given an Inters 
section for Each Number 
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Probability 



Number of Crossings 


Pig. 5-9. Cumulative Distribution Functions of n Cross- 
ings Given an Intersection for Each Number 
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.00075 


0025 


.025 


.075 


.0125 

Error Rate, a 2 

Pig. 5-10 • Average Sample Number at Termination for 
Class 2 with Experimental Points 
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are specified at .025. Point B is due t'o 637 tests for 
that class with {a^} specified at .0125. The four 
hypotheses are assumed equally likely. The results of 
the test are listed in Tables 5-3 to 5-6. 

Recall from Equation 1.23 that 

<*i - l Pje i1 (5.9) 

j*L ° J 

The error rates can be easily computed for each experi- 
ment as shown in Tables 5-1 and 5-2. These tables show 
that the performance of the proposed test can be con- 
rolled by the experimenter. 

Tables 5-7 and 5-8 show the average number of sample 
taken for the various results, Dj[ given . These number 
include the samples for which there is zero intersection. 
An estima te o f -the average sample number can be computed 
using Equation 3-52. The plot in Figure 5-10 reflects 
these estimates. The .curve should be below the experi- 
mental points 'as a result of the estimates made in 
Equations 3*^1 to 3.52. 

The experimental results tabulated in Tables 5-3 
and 5-5 show that -2 and 5 are similar. The majority of 
the errors made when 5 is true is the decision 2. This 
can be anticipated by observing the probability dis- 
tribution function of- N for these two letters. Figure 5-7 
or by simply noting that 2 is quite like the upside-down 
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Table 5-1. Error Rates for the First Experiment 



Table 5-2. Error Rates for the. Second Experiment 
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Given 


Decide 

Dl 

H 1 ' 

h 2 

H 3 

H U 

654 

14 

U 

30 

d 2 

20 

758 

5 

4 

d 3 ■ 

28 

9 

766 

18 

DU 

80 

1 

7 

730 


Table 5-3- Confusion Matrix [e^j]; Experimental Data 
from 782 Tests for Each Class Set Equal 
to .025 for All i=l 3 2,3j4 


Given 


Decide 
Dl 

D 2 

d 3 
Dij 

Table 5-4. Confusion Matrix [e^j]; Experimental Data 
Normalized to 1000 


Hi , H 2 He; Hjj 


834 

18 

5 

38 

26 

970 

6 

5 

37 

11 

980 

23 

103 

1 

9 

934 • 
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Given 



Table 5-5. Confusion Matrix [e-m]; Experimental Data 
from 637 Tests ct-^ Set Equal to .0125 for 
All i=l,2,3,4 


Decide ' 

Dl 

D 2 

. D 3 

• Dij 

.Table 5-6. Confusion Matrix [e^j ] j Experimental Data 
Normalized .to 1000 
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Given 


Decide 


D 2 

d 3 
DU 

Table 5-7. Conditional Average Sample Number = 0.025 
Including the Samples with Zero Intersection 


Hi H 2 H 3 Hi| 


• 232 

161 

241 

214 

30 

48 

16 

15 

94 

44 

72 

54 

187 

80 

114 

156 


. Given 


Decide 

D i 

Hi 

h 2 

h 3 

H4 

314 

192 ‘ 

233 

323 

d 2 

55 

67 

21 

0 

d 3 

88 

42 

88 

99 

d 4 

290 

0 

0 

214 


Table 5-8. Conditional Average Sample Number ctj_ = 0.0125 
Including the Samples with Zero Intersection 
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image of 5. The result is the large average sample 
number at termination. 

There are other pairs of numbers, such as 6 and 9, 
which this algorithm will have difficulty classifying. 

It is suggested that these similar shapes be classed 
into common families and other algorithms which are not 
invariant to rotation and mirror-imaging be used to sub- 
classify within each family. Application of the author’s 
algorithm first may simplify the subclassification 
process . 
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CHAPTER VI 


CONCLUSION 


A . RESULTS 

The fundamental characteristics of a classification 
algorithm are identified as the false declaration rate 
and the probability of detection. A sequential multi- 
class hypothesis test is proposed in Chapter III. 'A 
detailed study of the test shows that it terminates 
almost surely, and -its performance can be readily con- 
trolled. 

The' sequential machine becomes a threshold tester 
of certain functions of probability densities. The 
threshold levels which determine the operating charac- 
teristics ■'of - the pattern recognition machine are under 
the control of the experimenter. 

The input to this machine must necessarily be random 
quantities. In Chapter V invariant feature extraction 
is developed. A few features extracted by random lines 
are presented. They are used in Chapter V in experiments 
that simulate pattern recognition machines. Results of 
recognizing block letters and handwritten numerals are 
presented . 
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B. APPLICATIONS AND FURTHER RESEARCH* 

It Is anticipated that the new ideas and results 
presented here will be the forerunner of broader re- 
search and development in applications of multiclass 
hypothesis testing to pattern recognition using random- 
ized features. 

In particular the results of the experiment with 
handwritten numerals indicate that inexpensive and fast 
sorters can be built. Also the preliminary findings 
indicate that the features used, in the experiments may 
be insensitive to changes in style. The applications 
of such a system in the postal service or in business 
may relieve much cf the burden of hand sorting. 

This method could be fruitfully applied in under- 
standing and solving the problems of machine reading 
multifont and handwritten (script) matter. 

Multiclass sequential tests that use random features 
may fit well with techniques which use context infor- 
mation. A reader that correlates at the word level, 
for instance,, does not - need -.complete accuracy on each 
individual letter. If a word like ''California" is an- 
ticipated, even a ten per cent error rate on each indi- 
vidual letter can give highly accurate results. 
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GLOSSARY 


A i 


threshold used in Wald's sequential proba- ‘ 
hility rat jo -test; also used by Armitage • 


a i 

C i 

5(Dj_ | v) 


false alarm probability 
probability of incorrectly deciding 

thresholds used in the proposed test 

probability of verifying when the ob- 
servation is v 


D i 

d i 

e 

e ij 


decision to reject 
decision i 

discriminant function for the ith class 
expectation 

probability of deciding Dj_ given H j is true; 
entries to the confusion matrix 


F-i 


F, 


dF ± 


r 


probability distribution function of v^ 
given 

probability distribution function of 
v = (v 1 »v 2 , • . . ,v n ) given % 

probability density function, if it exists, 
of the observation v given Hj_ 

observation space v e r 


Yj_ probability of correctly deciding Dj_ 

(detection probability) 


Hi 

i>j,k 


hypothesis that the observation v is from 
the probability distribution 

often used as dummy index 
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& 

*1 



r 

s(P,0) 


t 


u i 

v 

wCi) 

x 

Z n U,j) 


Z n (l ,j>. 

) 

z m (i 5 j ) 


loss incurred by deciding D-^ when Hj is true 

number of hypotheses possible Hj ,H 2 , - ♦ . ,H M 

probability of Hj_ being true 

space of observation v e ip 

space of v in which decision Dj_ is made 
V £ ^ =) Di 

probability ratio used by Armitage 

average loss, risk 

random line specified by P and 0 

t e [0,1], parameter used by Chow to control 
the rejection rate 

ratio used in Reed's test 

observation, sample v = ( v^, v 2 , . . . , v n ) 

a vector used to v/eigh v = (vi,Vo,.... , vV) 
for the ith class 12 ' 

x = (x^,X 2 > • • • ,x n ) an observation 

product of probability ratios 
n 

n z m (i,j) 
m=l 

logarithm of Z n (i,j) 

ratio of probabilities dFj ( v^/dF.^ ( v m ) 
logarithm of z m (i,j) 
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APPENDIX 1 


THE EQUIVALENCE OF THE TWO FORMS OF THE PROPOSED TEST 

The two forms of the test in Chapter III are really 
the same test. For any i - 1,2,... 


l PjdFj(v) > C i P i dF 1 (v) 


jVi 


( 1 ) 


if and only if 


M 


l PjdF^v) > (Ci+DPidFiCv) 
3 =■! 


( 2 ) 


The statement of Equation 1 is the same as 


M ' 

Y P^dPh-Cv) > max{ (C^- +1 )P,*dFi ( v) ) 
j=l J J “ i 


(3) 


because {C-^}, {P^}, {dF-^v)} are all positive numbers 
Now consider 


min 

1 


l PjdFj(v) - CiPidFi(v) 

3?i 


O) 


under the condition of Equation 1. Then for any 
k = 1 , 2 , . . . , M 



106 


JPL Technical Memorandum 33-482 



I PjdFjCv) - CiPidFiCv) 

0‘Pi 

< I PjdFjCv) - C k P]cdF k (v) (5) 


Now cancelling equal terms and rearranging the terms gives 
-(Cj+DPidFiCv) < ~(C k +l)P k dF k (v) (6) 


or 

( Ci-fl)PidFi(v) - max((Cj+l)PjdFj (v) } . (7) 

j 

Hence, the two forms of the test differ only in the 
computations which are performed. 
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APPENDIX 2 


THE CHERNOFF BOUND 

The Chernoff bound uses the fact that an exponential 
curve bounds a step function. 



Fig. A-l. Chernoff Bound 


1 - sign(x-a) < exp{-A(x-a) } , A > 0 

(1) 

2 

Prob{x<a} = /l - sIgn(x~a)H Wy ) 

(2) 

2 

< /exp{-A(x-a)}dF(x) 

(3) 

= &{exp(~A(x~a) )} 

('0 


An error occurs if decision is made when Hp is 
true, i^k. Then 
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I P i dP 1 (v) < C k P k dF k (v), k^i 


fina in particular 

Pj^dFiCv) < C k P k dF k (v) , kj^i 
or 

ggiQ) w 

dP k (v) * P-l 

Hence the probability of error is bounded by 


Prob{D k [Hi true} = Prob j £ PjdFj < C k P k dF k (v) 

\ j^k 


£ ProbfPidFiCv) < C k P k dF k (v)} 


(5) 


= Prob 


'dPi(v) __ C k P k | 


= Prob 


ldF k (v) Pi 

m d fi (v) < m ¥k| 

dF k (v) Pi 


( 6 ) 


(7) 


( 8 ) 

(9) 

( 10 ) 


( 11 ) 


= t exp 


-A In aFiCv) + x i n C k P k 


M 

L 


dF k (v) 

p i 

- X In 

f c k p k| 

l 1 

' d F k ( v ) ’ 



p i 1 


- d p i(v)i 



( 12 ) 

(13) 
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or where a is some constant. 


Prob{D k [H^ true} £ a t \ d ^k ( v ) \ 

H il dF i (v)J 

But by the arguments of Chapter III, Section C, 

dF k (v) 
dFj_(v) 

The conclusion is that as n becomes large, errors 
of the type e k ^, kj^i, when 11^ is true, approach 0. 

Prob{D k jHj[ true} -+• 0 (16) 
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APPENDIX 3 


AVERAGE SAMPLE NUMBER 

The computation of the expected number of samples 
before a test ends, for a two-class test 3 is well known. 
The theorem given here is after Theorem 9-1 of Selin 
[Ref. *]]. 

Two computational devices are used. The first is 

CO \ CO CO 

l l = I l 

i=l j=l J-l i-.i 

The second is 

00 

l P(N > 1) = £(N) 

1=1 

where N is_a positive valued integer random variable. 

This can easily be seen by writing out a few terms of 
the expectation. 

£(N) = P(N=0 ) • 0 + P(N=1) • 1 + P(N=2 ) • 2 ... (3) 

= P(N=1) + P (N=2 ) + P(N=3) + ... 

+ P (N=2 ) + P(N=3) + . • . 

+ P(N=3 ) + . . . (*0 . 


( 1 ) 


( 2 ) 
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Summing the terms row by row. 


£(N) = P(N > 1) + P(N > 2) + ... (5) 


CO 

= V p(N > i) 
i=l 


( 6 ) 


Theorem : If the test ends at the Nth observation 

and £(N) < », then. 


£(N) = > J ^ 

t ( z ( i , j ) ) 


( 7 ) 


where 


Zn»>J> ' | In dF fV . f z n (i,J) 


n— 1 jTTi ( ,. \ n— 1 


dP i^ v n) 


( 8 ) 


Proof : Since N is itself a random variable. 


£(Z n ( i 3 j)) =- l P(N=k) £(Zjq(i , j ) [ N=k) 
k-1 


( 9 ) 


l P ( N=k ) l £{z m (i,j)|N=k) 
k-1 m=l 


( 10 ) 


=11 P(N=k)£(z (i,j)|N=k) (11) 

• m~l k=m 


= l P(N > m)£(s ra Ci,j ) (N > m) (12) 

m-1 
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The event N 2L m can occur only if the te-st has not ended 
by the (m-l)th observation, and hence this event is in- 
dependent of Zjjjdjj). Therefore, 


00 

g(Z N (i,J>) = l P(N > m)£(z(i,j)) (13) 

m=l 

CO CO 

= £(z(i,j)) l l P(N=k) (14) 

m=l k=m 

oo 

= ^z(ij)) l l P(N=k) (15) 

k=l m=l 

CO 

= £(z(i,j)) l kP(N-k) - (16) 

k=l 

• £(z(i,j))£(N) (17) 
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APPENDIX 4 


UNIFORM RANDOM LINES 


A line can be named uniquely [Ref. 20, p. 14] by 
u and v and the equation, 

ux + vy + 1 = 0 (1) 

This representation excludes lines which pass through 
the origin. This is of little concern since the measure 
of all such lines is zero. 



Fig. A~2 . Definition of a Random Line 


1.14 
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Another representation of a line may use the polar 
coordinates of the point on the line nearest the origin. 
The equation of the line is 


P QS i + y sln * +1=0 

-P -P 


( 2 ) 


A rotation and translation (a,b) in the two di- 
mensional plane can be represented by 

X = a + x cos 0 - y sin 9 

Y = b + x sin 0 + y cos 0 (3) 


where 0 £ 0 < 2x. 

The parameters change and the new line is 


cos b ' . sin $ ' 
x + y ~jp +1 = 0 


where 


P* = P - cos 4> - b sin <p 


(ji ' = (j> — 9 


(4) 


(5) 


This can be shown by substituting Equation 3 into Equation 
2 and collecting terms to obtain Equation 4. 

Let a set of random lines be denoted by E. A trans- 
formation places these lines in E t . Random lines are 
uniformly distributed when P(E) = P(E T ). It is shown 
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in Kendall and Moran [Ref. 20] that uniform ra.ndom lines 
are possible if P and $ are each uniform. 

A uniformly distributed random line may be generated 
.by choosing P and <f> uniformly. The line so chosen is 
Equation 2. These lines will intersect a circular retina 
of radius R. The circular retina is the area in which 
all observations are confined if 0 < P < R and 0 < <p < 2ir. 
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